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AMENDMENTS TO THE CLAIMS: 

This listing of the claims will replace all prior versions, and listings, of the claims in this 
application. 

Listing of Claims: 

1 . (Currently Amended) A method to process a at least one text document, comprising: 

partitioning text of each of the at least one text document and assigning semantic meaning to 
words of the partitioned text, where assigning comprises applying a plurality of regular 
expressions, rules and dictionaries comprising a common chemical prefix dictionary and a 
common chemical suffix dictionary to recognize chemical name fragments; 

recognizing any substructures present in the chemical name fragments; 

extracting keywords associated with the recognized chemical name fragments and the 
substructures of the text document and indexing the extracted keywords in a text index; 

adding each of the recognized chemical name fragments and the substructures that do not contain 
a number to the text index; 

determining structural connectivity information of each of the recognized chemical name 
fragments and the substructures that do not contain a number; 
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indexing representations of the recognized chemical name fragments and the substructures in 
association with the determined structural connectivity information into a plurality of chemical 
connectivity tables of a chemical substructure index , where indexing the representations 
comprises: 

in a loop, testing each of the recognized chemical name fragments in a first text document 
of the at least one text document to see if the recognized chemical name fragment occurs 
in a dictionary of SMILES fragments, where if it does then a SMILE S expression for the 
fragment token is added to the chemical substructure index, then 

determining if the recognized chemical name fragment occurs in a MOL file dictionary, 
where if it does then a MOL file expression for the fragment token is added to the 
chemical substructure index, and then 

determining if there is a next text document of the at least one text document, where if 
there is a next text document then testing, as stated above, each of the recognized 
chemical name fragments in the next text document and where if there is no next text 
document the indexing is completed ; 

storing the text index in association with the chemical substructure index; 
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providing a graphical user interface to search the text index and the chemical substructure index, 
where the search comprises first entering search terms comprising one or more chemical 
fragment names and then selecting graphical representations of one or more substructures, where 
the selecting comprises using the graphical user interface as a pointer to a graphical list of 
substructures; and 

receiving a search result, where the search result is an intersection of the chemical 
substructure index and the text index, identifying at least one document where there are found 
chemical compounds that contain the selected substructures, and connectivity specified by the 
one or more chemical fragment names and the selected substructures. 

2. (Previously Presented) The method as in claim 1, wherein the search further comprises first 
entering search terms comprising the one or more chemical fragment names and entering at least 
one keyword, and where the search result is identifying at least one document where there are 
found the at least one keyword, the chemical compounds that contain the selected substructures, 
and the connectivity specified by the one or more chemical fragment names and the selected 
substructures. 

3. (Previously Presented) A method as in claim 1 performed by executing a computer program 
product. 

4. - 6. (Cancelled) 
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7. (Previously Presented) The method as in claim 1, where determining structural connectivity 
information comprises looking up recognized chemical name fragments and substructures in a 
structure dictionary. 

8. (Cancelled) The method as in claim 1, where the ind e xing representations of the recognized 
chemical name fragments and the substructures comprises: 

testing if each of the recognized chemical name fragments occur in a SMILES fragment 
dictionary, where if it does occur in the SMILES fragment dictionary then adding the chemical 
name fragment to the chemical substructure index as the SMILES representation, and 
testing if each of the recognized chemical name fragments occur in a MOL file fragment 
dictionary, where if it does occur in the MOL file dictionary then adding the chemical name 
fragments to the chemical substructure index as the MOL file representation. 

9. (Previously Presented) The method as in claim 1 , where said plurality of dictionaries consists 
of the dictionary of common chemical prefixes and the dictionary of common chemical suffixes. 

1 0. (Previously Presented) The method as in claim 1 , where said plurality of dictionaries consists 
of the common chemical prefix dictionary and the common chemical suffix dictionary, and a 
dictionary of stop words to eliminate erroneous chemical name fragments. 

11. (Previously Presented) The method as in claim 1, further comprising filtering recognized 
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chemical name fragments using a list of stop words to eliminate erroneous chemical name 
fragments. 

1 2. (Previously Presented) The method as in claim 1 , where chemical name fragments are further 
recognized by using common chemical word endings. 

13. (Previously Presented) The method as in claim 1, where application of said regular 
expressions and rules results in punctuation characters being one of maintained or removed from 
between chemical name fragments as a function of context. 

14. (Previously Presented) The method as in claim 1 , where said regular expressions comprise a 
plurality of patterns, individual ones of which are comprised of at least one of characters, 
numbers and punctuation. 

15. (Previously Presented) The method as in claim 14, where the punctuation comprises at least 
one of a parenthesis, a square bracket, a hyphen, a colon and a semi-colon. 

1 6. (Previously Presented) The method as in claim 1 4, where the characters comprise upper case 
C, O, R, N and H. 

17. (Previously Presented) The method as in claim 1 4, where the characters comprise lower case 
xy, ene, ine, yl, ane and oic. 
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18. (Previously Presented) The method as in claim 1 , comprising an initial step of tokenizing the 
document to provide a sequence of tokens. 

19. (Currently Amended) A system having at least one computer, comprising: 

a tokenizer module and a token processing module configured comprised of computer 
instructions in data storage distributed across the at least one computer directing the at least one 
computer to partition text of fee each of at least one text document and to assign semantic 
meaning to words of the partitioned text by applying a plurality of regular expressions, rules and 
dictionaries comprising a common chemical prefix dictionary and a common chemical suffix 
dictionary to recognize chemical name fragments; 

the instructions of the token processing module configured directing the at least one computer to 
recognize any substructures present in the chemical name fragments; 

the instructions of the token processing module configured directing the at least one computer to 
extract keywords associated with the recognized chemical name fragments and the substructures 
of the text document and to index the extracted keywords in a text index; 

the instructions of the token processing module configured directing the at least one computer to 
add each of the recognized chemical name fragments and the substructures that do not contain a 
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number to the text index; 

the instructions of the token processing module configured directing the at least one computer to 
determine structural connectivity information of each of the recognized chemical name fragments 
and the substructures that do not contain a number, and to index representations of the recognized 
chemical name fragments and the the substructures in association with the determined structural 
connectivity information into a plurality of chemical connectivity tables of a chemical 
substructure index , where indexing the representations comprises: 

in a loop, testing each of the recognized chemical name fragments in a first text document 
of the at least one text document to see if the recognized chemica l name fragment occurs 
in a dictionary of SMILES fragments, where if it does then a SMILE S expression for the 
fragment token is added to the chemical substructure index, then 

determining if the recognized chemical name fragment occurs in a MOL file dictionary, 
where if it does then a MOL file expression for the fragment token is added to the 
chemical substructure index, and then 

determining if there is a next text document of the at least one text document, where if 
there is a next text document then testing, as stated above, each of the recognized 
chemical name fragments in the next text document and wher e if there is no next text 
document the indexing is completed ; 
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the instructions of the token processing module configured directing the at least one comp uter to 
store the text index in association with the chemical substructure index; 

a searcher module comprised of computer instructions distribute d across the at least one 
computer and a graphical user interface comprised of a display a nd a keyboard connected to a 
computer of the at least one computer configured directing the at least one computer to search the 
text index and the chemical substructure index, where the search comprises first entering one or 
more chemical fragment names and then selecting graphical representations of one or more 
substructures, where the selecting comprises using the graphical user interface as a pointer to a 
graphical list of substructures; and 

the graphical user interface configured to receive a search result, where the search result is an 
intersection of the chemical substructure index and the text index, identifying at least one 
document where there are found chemical compounds that contain the selected substructures, and 
connectivity specified by the one or more chemical fragment names and the selected 
substructures. 

20. (Previously Presented) The system as in claim 19, wherein the search further comprises first 
entering the one or more chemical fragment names and additionally entering at least one 
keyword, and where the search result is identifying at least one document where there are found 
the at least one keyword, the chemical compounds that contain the selected substructures, and the 
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connectivity specified by the one or more chemical fragment names and the selected 
substructures. 

21. -24. (Cancelled) 

25. (Currently Amended) The system as in claim 19, where the instructions of said token 
processing module that is configured directs the at least one computer to determine the structural 
connectivity information-is-further configured directs the a t least one computer to look up 
recognized fragments and substructures in a structure dictionary. 

26. (Cancelled) Thnnyntmi nnin claim 10. whcro the instructions of the token processing module 
configured that directs the at least one computer to index representations is further configured 
directs the at least one computer to test if each of the recognized chemical name fragments occur 
in a SMILES fragment dictionary, where if it docs occur in the SMILES fragment dictionary the 
token pgeeessiftg module ir, configured the token processing module directs the at least ^ga 
computer to add the chemical name fragment to the chemical substructure index as the SMILES 
representation, and 

test if each of the recognized chemical name fragments occur in a MOL file fragment 
dictionary, where if it docs occur in the MOL file dictionary the token processing module is 
configured to add the chemical name fragments to the chemical substructure index as the MOL 
file representation. 
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27. (Previously Presented) The system as in claim 1 9, where said plurality of dictionaries consists 
of the dictionary of common chemical prefixes and the dictionary of common chemical suffixes. 

28. (Previously Presented) The system as in claim 1 9, where said plurality of dictionaries consists 
of the dictionary of common chemical prefixes, the dictionary of common chemical suffixes, and 
a dictionary of stop words to eliminate erroneous chemical name fragments. 

29. (Currently Amended) The system as in claim 19, further comprising the instructions of said 
token processing module is-further configured directs the at least one computer to filter 
recognized chemical name fragments using a list of stop words to eliminate erroneous chemical 
name fragments. 

30. (Currently Amended) The system as in claim 19, where the instructions of the tokenizer 
module is-further configured directs the at least one computer to recognize chemical name 
fragments by using common chemical word endings. 

31. (Previously Presented) The system as in claim 19, where application of said regular 
expressions and rules results in punctuation characters being one of maintained or removed from 
between chemical name fragments as a function of context. 

32. (Previously Presented) The system as in claim 1 9, where said regular expressions comprise a 
plurality of patterns, individual ones of which are comprised of at least one of characters, 
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numbers and punctuation. 

33. (Previously Presented) The system as in claim 32, where the punctuation comprises at least 
one of a parenthesis, a square bracket, a hyphen, a colon and a semi-colon. 

34. (Previously Presented) The system as in claim 32, where the characters comprise upper case 
C, O, R,NandH. 

35. (Previously Presented) The system as in claim 32, where the characters comprise lower case 
xy, ene, ine, yl, ane and oic. 

36. (Currently Amended) The system as in claim 19, further comprising an input tokenizer 
module configured comprised of computer instructions directing the at least one computer to 
receive documents to be processed to provide a sequence of tokens. 

37. -41. (Cancelled) 

42. (Previously Presented) The system as in claim 43, where said plurality of dictionaries 
consists of the dictionary of common chemical prefixes, the dictionary of common chemical 
suffixes, and a dictionary of stop words to eliminate erroneous chemical name fragments. 

43 . (Currently Amended) A system comprising a plurality of computers at least two of which are 
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coupled together through a data communications network, said system comprising: 

a tokenizer and a token processing unit configur e d module com prised of computer instructions in 
data storage distributed across the plurality of computers directing the plurality of com puters to 
parse text of a each of at least one text document and assign semantic meaning to words of the 
parsed sentences, where assigning comprises applying a plurality of regular expressions, rules 
and dictionaries consisting of a common chemical prefix dictionary and a common chemical 
suffix dictionary to recognize chemical name fragments; 

the instructions of the t oken processing unit rrmfimirerl module directing the plurality of 
computers to recognize any substructures present in the chemical name fragments; 

the instructions of the token processing module configured directing the plurality of computers to 
extract keywords associated with the recognized chemical name fragments and the substructures 
of the text document and to index the extracted keywords in a text index; 

the instructions of the token processing module configured directing the plurality of computers to 
add each of the recognized chemical name fragments and the substructures that do not contain a 
number to the text index; 

the instructions of the t oken processing module configured directing the plurality of computer s to 
determine structural connectivity information of each of the recognized chemical name fragments 
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and the substructures that do not contain a number; 

the instructions of the token processing module configured directing the plurality of computers to 
index representations of the recognized chemical name fragments and the substructures in 
association with the determined structural connectivity information into a plurality of chemical 
connectivity tables of a chemical substructure index , where indexing the representations 
comprises: 

in a loop, testing each of the recognized chemical name fragments in a first text document 
of the at least one text document to see if the recognized chemical name fragment occurs 
in a dictionary of SMILES fragments, where if it does then a SMILES expression for the 
fragment token is added to the chemical substructure index, then 

determining if the recognized chemical name fragment occurs in a MOL file dictionary, 
where if it does then a MOL file expression for the fragment token is added to the 
chemical substructure index, and then 

determining if there is a next text document of the at least one text document, where if 
there is a next text document then testing, as stated above, each of the recognized 
chemical name fragments in the next text document and where if there is no next text 
document the indexing is completed ; 
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the instructions of the token processing module configured directing the plurality of computers to 
store the text index in association with the chemical substructure index; 

a searcher module comprised of computer instructions distribut ed across the plurality of 
computers and a graphical user interface comprised of a display a nd a keyboard connected to a 
computer of the plurality of computers configured directing the plurality of computers to search 
the text index and the chemical substructure index, where the search comprises first entering 
search terms comprising one or more chemical fragment names and then selecting graphical 
representations of one or more substructures, where the selecting comprises using the graphical 
user interface as a pointer to a graphical list of substructures; and 

the graphical user interface configured to receive a search result, where the search result is an 
intersection of the chemical substructure index and the text index, identifying at least one 
document where there are found chemical compounds that contain a reference to the search terms 
and the one or more substructures. 

44. (Previously Presented) The system as in claim 43, wherein the search further comprises first 
entering the one or more chemical fragment names and additionally entering at least one 
keyword, and where the search result is identifying at least one document where there are found 
the at least one keyword, the chemical compounds that contain the selected substructures, and the 
connectivity specified by the one or more chemical fragment names and the selected 
substructures. 
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45. (Currently Amended) The system as in claim 43, where the instructions of said token 
processing module is further configured direct the plurality of computers to look up recognized 
fragments and substructures in a structure dictionary. 

46. (Cancelled) The system as in claim 43 , where the indexing representations of the recognized 
chemical name fragments and the substructures comprises: 

testing if each of the recognized chemical name fragments occur in a SMILES fragment 
dictionary, where if it does occur in the SMILES fragment dictionary then adding the chemical 
name fragment to the chemical substructure index as the SMILES representation, and 
testing if each of the recognized chemical name fragments occur in a MOL file fragment 
dictionary, where if it does occur in the MOL file dictionary then adding the chemical name 
fragments to the chemical substructure index as the MOL file representation. 
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